German questions are welcome
Will the talk help me?
Transition from sequential code to asynchronous with Futures
Javascript / E / .Net
Max Maischein
DZ BANK Frankfurt
Deutsche Zentralgenossenschaftsbank
Data Scientist
Internal search engine
Web crawler / Intranet crawler -> Windows
Why linear? 'Cause it's easy.
preemptive multitasking
It's not the matter of how well the bear dances
... but that it dances at all
Only understandable with the picture
Not pictured: HTTP state machine
nested state machines are hard
AnyEvent
Callback Hell
Javascript
async
/ await
A pair of keywords to write asynchronous code in a "nicer" way
Future::AsyncAwait
Perl 5.16+
Queue - a Queue
1: our @queue;
UA - User Agent ("Requester", "Browser"):
LWP::UserAgent
1: $ua->get($url);
UA - User Agent ("Requester", "Browser"):
LWP::UserAgent
1: $ua->get($url);
Future
1: my $ua = Future::HTTP->new(); 2: $ua->http_get($url);
Extractor
z.B. HTML::Selector::XPath
... or regular expressions
1: our %seen; 2: our @queue = @ARGV; 3: ... 4: do { 5: my $url = shift @queue; 6: #sleep 10; 7: fetch_and_extract( $url ); 8: } while (@queue); 9: print for @results;
Ideally same API just with ->get()
tacked on:
1: my $response = $ua->get($url);
becomes
1: my $response_future = $ua->get( $url ); # start 2: # do something else 3: my $response = $response_future->get(); # Use result
Idealerweise wie die synchrone API, nur mit "->get()" dahinter:
1: my $response = $ua->get($url);
becomes
1: my $response_future = $ua->get( $url )->then(sub { 2: my ($response) = @_; 3: # Use result 4: }); # starten
Modul: Future
->get()
calls the backend framework
Still a state machine
... but hidden
"await" in front of Future
"await" hides the ->get()
call
"await" in front of Future
"await" hides the ->get()
call
1: my $response_future = $ua->get( $url ); # start 2: my $response = $response_future->get(); # Use result
"await" in front of Future
"await" hides the ->get()
call
1: my $response = await $ua->get( $url );
1: my $f = $ua->get(); 2: $f->then( sub { 3: my( $response ) = @_; 4: print "Have result, let's go\n"; 5: });
wird zu
1: my $response = await $ua->get(); 2: print "Have result, let's go\n";
1: my $f = AnyEvent::Future->timer( after => 5 ); 2: $f->then( sub { 3: my( $response ) = @_; 4: print "Woke up, let's go\n"; 5: });
wird zu
1: my $f = await AnyEvent::Future->timer( after => 5 ); 2: print "Woke up, let's go\n";
1: our %seen; 2: our @queue = @ARGV; 3: ... 4: my @done; 5: repeat { 6: my $url = shift @queue; 7: # await sleep(10); 8: $pending{ $url } = 1; 9: fetch_and_extract( $url )->then(sub { 10: delete $pending{ $url }; 11: }); 12: 13: my $next_action; 14: if( @queue ) { 15: $next_action = Future->done(1) 16: } elsif( scalar keys %pending ) { 17: $next_action = Future->done(1) 18: } else { 19: $next_action = $done; 20: $done->done(@results) 21: } 22: $next_action 23: }, while => sub { $_[0]->get }); 24: @results = $done->get;
Fragen?
1: fork async 2: CPU Beliebige CPUs 1 CPU 3: Crash egal fatal 4: Race 5: conditions Ja eher nein 6: Framework Parallel::FM Future / AnyEvent / Mojolicious / ... 7: Globale 8: Variablen Nein Ja 9: Implementation Trivial Rewrite
avoid IPC
Use Dancer and WWW::Mechanize::Chrome in the same process
async
macht deutlich, wohin abgegeben werden kann
Nur an den direkten Aufrufer
Coro macht viel mehr Gymnastik
Kann dafür auch viel mehr
Möchte/brauche ich nicht
Queue sollte persistent sein
Anfrage und Antwort speichern (WARC)
1: Queue -> In flight -> Extractor -> WARC -> Results 2: ^+---- retry ---+ | 3: ^+---- new links ------+
Wenn Du alle Fragen kennst, die in Zukunft gestellt werden, warum bist Du dann Programmierer?
Wir möchten die Zahl der Requests wieder reduzieren
5 Sekunden bevor der nächste Request gesendet wird
1: sleep 5;